fix(search): resolve zero results bug and improve UX#20
Merged
Conversation
- Fix L2 distance conversion in LanceDBVectorStore * Use exponential decay: score = e^(-distance^2) * Provides scores 0-1 range with better distribution * Fixes issue where all results were filtered out - Fix CLI metadata field reference * Changed metadata.file to metadata.path * Prevents 'undefined' errors in search output Fixes search returning 0 results despite indexed data.
- Add 11 integration tests for search functionality - Tests cover: stats, semantic search, thresholds, limits, sorting - Tests validate score ranges and metadata structure - Uses real indexed data from dev-agent repository Provides regression protection for search bug fixes.
- Add .dev-agent.json and .dev-agent/ to gitignore - Remove tsx devDependency (was only for debug script) - Update pnpm-lock.yaml
- Add practical Quick Start with npm link instructions - Include real search output from testing - Document threshold recommendations (0.7=precise, 0.25=exploratory) - Add explore command documentation - Show actual semantic search scores and results - Include pro tips for scripting and workflows Makes documentation reflect actual working functionality.
- Add 9 unit tests for distance-to-similarity conversion * Tests the core bug fix: score = e^(-distance²) * Validates score ranges, monotonic decrease, edge cases * Fast (<2ms), deterministic, always run in CI - Make integration tests skip in CI by default * Set RUN_INTEGRATION=true to run in CI if needed * Require pre-indexed data (.dev-agent/) * Run locally after `dev index .` Hybrid approach: Unit tests catch logic bugs, integration tests validate real behavior when available.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🐛 Problem
Search was indexing 566 documents but returning 0 results for all queries, making the semantic search feature completely non-functional.
Root Cause
LanceDB returns L2 distance (~1.0 for similar vectors), but our code incorrectly calculated:
All results scored 0 and were filtered out by the threshold.
✅ Solution
1. Fix Distance-to-Similarity Conversion
score = e^(-distance²)2. Fix CLI Metadata Access
metadata.file→metadata.path3. Add Comprehensive Tests
4. Update Documentation
🧪 Verification
Before:
$ dev search "coordinator" --threshold 0.7 ✖ Found 0 result(s)After:
$ dev search "coordinator" --threshold 0.3 1. CoordinatorLogger (42.6% match) 2. Coordinator - The Central Nervous System (42.4% match) 3. CoordinatorLogger.info (35.6% match) ✔ Found 3 result(s)Test Results:
📊 Impact
Search Quality Examples:
RepositoryIndexervector embeddingshow do agents communicateerror handlingScore Interpretation:
🔗 Commits (Atomic)
fix(search): Correct distance-to-similarity calculationtest(search): Add 11 comprehensive integration testschore: Update gitignore, remove tsx dependencydocs: Update READMEs with real examplesEach commit builds and tests independently.
🚀 Next Steps
explore similarcommand (searches filename as text, not content)Dogfooded on dev-agent itself: All examples are real searches on this repository! 🐕